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ABSTRACT 

Motivation: Reconstructing the topology of a gene regulatory 
network is one of the key tasks in systems biology Despite of 
the wide variety of proposed methods, very little work has been 
dedicated to the assessment of their stability properties. Here 
we present a methodical comparison of the performance of a 
novel method (RegnANN) for gene network inference based on 
multilayer perceptrons with three reference algorithms (ARACNE, 
CLR, KELLER), focussing our analysis on the prediction variability 
induced by both the network intrinsic structure and the available data. 
Results: The extensive evaluation on both synthetic data and a 
selection of gene modules of Escherichia coli indicates that all the 
algorithms suffer of instability and variability issues with regards to the 
reconstruction of the topology of the network. This instability makes 
objectively very hard the task of establishing which method performs 
best. Nevertheless, RegnANN shows MCC scores that compare very 
favorably with all the other inference methods tested. 
Availability: The software for the RegnANN inference algorithm is 
distributed under GPL3 and it is available at the corresponding author 

home page |http : / /mpba . fbk ■ eu/grimaldi/ regnann-supmat| 

together with the Supplementary Material. 
Contact: grimaldi@fbk.eu (Marco Grimaldi) 



1 INTRODUCTION 

Since the first examples dating back to early seventies ( iGlass and Kauffma 

the challenge of reconstructing the links among genes 
in a regulatory network starting from their expression signals has 



"Inferring gene networks is a daunting task", not only in terms of 
devising an effective algorithm, but also in terms of quantitatively 
interpreting the obtained results. Only recently efforts have been 
carried out towards an objective comparison of net work inference 
metho d s also highlighting occurring li mitations i Krishnan et al\ 
l l2007h ; lAltav' and Emmert- StreibI | |20 1 d) : iMarbach et al\ ( |20 1 Oh ) . 

This work compares four network reverse engineering methods, 
first settling in a controlled situation with synthetic data and 
then focusing on a biological setup by analysing transcriptional 
subnetworks of Escherichia coli. In order to simplify our 
comparative evaluation, we will only consider the underlying 
topology, thus neglecting both weight and direction of the links 
among the genes. In doing so, we confine the analysis of the 
reconstructed network in terms of the binary existence or not- 
existence of an edge. The general performance of the network 
inference task is evaluated in ter ms of Matthews Correlation 
Coefficient (MCC, iMatthewsl h975h - see Sup. Mat. for details). 
MCC is becoming the measure of choice in many application fields 
of machine learning and bioinformatics; it is one of the best methods 
for summarizing into a single value the confusion matrix of a binary 
classification task. Recently it has als o been used for comparing 
network topologies l lStokic el al] ( l2009h ). 

In this paper we introduce a novel inference method called 
Reverse Engineering Gene Networks with Artificial Neural 
Networks (RegnANN). This approach is based on an ensemble 
«f multilayer perceptrons trained using steady state data. Its 



been tackled by several laboratories worldwide. These initial efforts 
have originated a number of related publications which has been 
exponentially growing in the last few years. 

The inference methods generally employed are of very 
different nature, rang ing from de t ermini stic, e.g.: systems of 
differential equa t ions jBansal et al.\ ilOOH) ) and Groebner bases 
Dimitrova et al. o st ochastic approaches, e .g.: Boolean 



Kauffmanl ( ll993l) ) or Bayesian ^Friedman etall ( I2OO0I) ) algorithms 



Such approaches may also start from different types of gene 
expression data: time-course or steady states. Furthemore, also the 
detail and the complexity of the considered network can vary as 
the links may carry information about the direction of the relation 
(directed graph) and a weight may be ass ociated to the strenght 
of ea c h link (weighted directed graph) iMarkowetz and Spand 
( |2007|) :[ Karlebach and ShamiJ l l200^ . Generally, the reconstruction 
accuracy is far from being optimal in many situations with the 
presence of sever al pitfalls , relat ed to bo th the methods an d 
the available data jHe et al\ ( |2009|) ). Citing iBaralla etal] | |2009|) . 

*to whom correspondence should be addressed. 



■perfomanc e is compar e d with those of top- scoring methods such a s 
KELLER jSong et al\ j2009l) ). ARACNE ( iMargolin et aZI fcOOfih ) 
and CLR JFaith et al\ OOOTh ) while assessing possible sources of 
instability. To improve the general ef ficiency of RegnAN N we 
implement the algorithm using GPGPU jLahabaref a/1 l l2008l) ) 

The extensive evaluation on both synthetic and biological 
data indicates that the algorithms tested suffer of instability and 
variability issues with regards to the reconstruction of the network 
topology. The instability makes objectively very hard the task of 
establishing which method performs best. Nevertheless, RegnANN 
shows MCC scores that compare very favorably with all the other 
inference methods tested. 



2 METHODS 

2.1 RegnANN: network inference using ANN 

To infer gene regulatory networks we adopt an ensemble of feed-forward 
multilayer perceptrons (Bishop 1 1995j)) trained using the back-propagation 
algorithm. Each member of the ensemble is essentially a multi-variable 
regressor (one to many) trained using an input expression matrix to learn 
the relationships (correlations) among a target gene and all the other genes 
in the network. We proceed in determining the interactions among genes 
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sepai'ately and then we join the information to form the overall network. 
From each row of the gene expression matriiQ we build a set of input and 
output patterns used to train a selected multilayer perceptron. Each input 
pattern corresponds to the expression value for the selected gene of interest. 
The output pattern is the row-vector of expression values for all the other 
genes for the given row in the gene expression matrix {Figure[T}. By cycling 
through all the rows in the matrix, each regressor in the ensemble is trained 
to learn the correlations among one gene and all the others. Repeating the 
same procedure for all the columns in the expression matrix, the ensemble 
of multi-variable regressors is trained to learn the correlations among all the 
genes. 

The procedure of determining separatel y the intera c tions among genes 
is very similar to the one presented in ISong et al\ |200g| ), where the 
authors propose to estimate the neighborhood of each gene (the correlations 
among one gene and all the others) independently and then joining these 
neighborhoods to form the overall network, thus reducing the problem to a 
set of identical atomic optimizations (Section [2.2t . 

Here we build A*^ - one for each of the genes in the network - multilayer 
perceptrons with one input node, one layer of hidden nodes and one layer 
of A'^ — 1 output nodes. The input node takes the expression value of 
the selected gene rescaled in [—1, 1]. The number of hidden nodes is set 
empirically to the square root of the number of inputs by the number of 
outputs, resulting in ^^N — 1. The activation function is the hyperbolic 
tangent, which provides output values in the range [—1,1], thus making 
the output values interpretable in terms of positive correlation (+1), anti- 
correlation (—1) and not-correlated (0). The other parameters used to learn 
each multi-layer perceptron are as follows: learning rate equal to 0.01; 
momentum equal to 0.1, learning epochs equal to 10000; bias equal to 

Finally, the topology of gene regulatory networks is obtained by applying 
a second procedure. The correlation of each gene with all the others 
is extracted by passing a purposely made test pattern to the regressor: 
considering separately each multilayer perceptron in the ensemble, a value 
of 1 is passed to its input neuron, consequently recording its output values. 
In this way, the correlation between the corresponding gene with all the 
others is obtained as a vector of values in [—1,1]. By cycling through all 
the members of the regression system, we obtain the adjacency matrix of 
the sought gene network. It is important to note that this procedure does 
not allow discovering of gene self correlation (regulation) patterns, but only 
coiTelation patterns among different genes. Moreover, the algorithm here 
proposed canno t estima t e futu re values, because it is not a predictor, as in the 
case of GR NN |S pechll fT993l) : instead it models static correlations between 
genes. As in ISong et g/.l 12003) . it is possible to extend the regression system 
to take into account dynamic rewiring of the topology, but this is beyond the 
scope of the present work. 

To improve the general efficiency of the algorithm and thus allow a 
systematic comparison of its performance with the other gene network 
reverse engineering methods tested (Subsection 12. 2t , we implemented the 
ANN based regression system using the G PGPU programming paradigm 
jLahabar et Q/1l2008l) ; IScanzio et a/.N201(]|) ). 

2.2 Alternative inference methods 

As reference methods we select three alternative algorithms widely used in 
literature: ARACNE, CLR and KELLER. 



KELL ER: it is a kernel-reweighted logistic regression method iSong et al\ 
i2009l) ) introduced for reverse engineering the dynamic interactions between 
genes based on the time series of their expression values. It estimates the 
neighborhood of each gene separately and then joins the neighborhoods 
to form the overall network. The approach aims at reducing the network 



In this work we consider gene expression matrices of dimension M X N: 
N genes whose expression levels are recorded AI times. 
^ These values are evaluated empirically during preliminary tests on 
synthetic data. 
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Fig. 1. The ad hoc procedure proposed to build the training input/output 
patterns starting from a gene expression matrix. Each input pattern 
coiTesponds to the expression value for the selected gene of interest. The 
corresponding output pattern is the vector of expression values for all the 
other genes for the given row in the gene expression matrix. 



inference problem to a set of identical atomic optimizations. KELLER 
makes use of the Zi -regularized logistic regression algorithm and operates 
modeling the distribution of interactions between genes as a binary pair-wise 
iVIarkov Random Field. The method has been applied to reverse engineer 
genome-wide interactions taking place during the life cycle of Drosophila 
melanogaster. Although KELLER has been developed to uncover dynamic 
rewiring of gene transcription networks (e.g.: dynamic changes in their 
topology), here we consider constant network topology for a given gene 
expression matrix. In this w ork we make use of the reference implementation 
of the algoiithm provided in ISong et al\ |2009|) . 

ARACNE: it is a general method able to addres s a wide range of netw ork 
deconvolution probl ems - from transcriptio nal iMargolin et al\ |200(|)) to 
metabolic networks jNemenman et all (iooj) - that was originally designed 
to scale up to the complexity of regulatory networks in mammalian cells. 
The method makes use of an information theoretic approach to eliminate 
the majority of indirect interactions inferred by co-expression methods. 
ARACNE removes the vast majority of indirect candidate interactions 
using a w ell-known information the oretic property: the data processing 
inequality iCover and Thoma j jl99lh ). In this work we use the reference 
implementation of the algorithm provided in lMever et al\ i2008l) with default 
value for the data processing inequality tolerance parameter. 

CLR: it is an ex tension of the relevance networks class of algorithms 
IFaith et al l j2007h ). which predicts regulations between transcription factors 
and genes making use of the mutual information score. CLR proposes 
an adaptive background correction step that is added to the estimation 
of mutual information. For each gene, the statistical likelihood of the 
mutual information score is computed within its network context. Then, 
for each transcription factor-target gene pair, the mutual information score 
is compared to the context likelihood of both the transcription factor 
and the target gene, and turned into a z- score. We adopt th e reference 
implementation of the algorithm provided in iMever et al\ <2008ll . 

2.3 Experimental protocol 

We are interested in comparing the performance of the selected reverse 
engineering methods in inferring the und erlying topology of regulatory 
networks. As proposed in lSong et al\ j2009l) . we focus on the estimation of 
the interaction structures between genes, rather than the strength of these 
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interactions. The inferred adjacency matrix is symmetric and discretized 
with values in {0, 1} by thresholding. 

The binarization of the inferred network obtained with RegnANN is 
achieved using by using a thr eshold value of 0.5 . In the case of KELLER, 
the reference implementation iSong et al\ )2009h ) returns a symmetric and 
discrete (with values in {0, 1}) adjacency matrix - binaiization is obtained 
by rounding values bigger than 10"'^ to 1. Results obtained with ARACNE 
are discretized as in the case of KELLER. Usually, the cutoff value for 
the mutual information is estimated fo r each data -set separately using a 
significance measure (e.g.: the F-score jAltav and~E mmert-Streibl ilOlOl) )) 
or bui ldin g a Precision-Recall cui-ve and selecting the desired thi'eshold 
value (Margolin et al. 1 200^)). Here, the threshold value is kept constant 
to avoid the introduction of a selection bias in the outcome of the ARACNE 
algorithm. The same procedure is applied to CLR (threshold value of 10^''). 

The accuracy (in terms of MCC) of the inference methods is firstly 
evaluated on synthetic data (Section [3) by varying the topology of the 
network, its size, the amount of data available, the method adopted to 
synthesize the data and the method adopted to normalize the data prior to 
network inference - see Supplementary Material for details. Methodically, 
we vary one parameter at a time and then measure the performance of 
the systems as the mean of 10 randomly initiahzed runs. For each run, 
the network topology is randomly generated with the desired number of 
genes (N), the expression profiles - the data - are (randomly) generated the 
required number of times (A/), the selected normaUzation method is applied 
and the MCC values for the applied reverse engineering method recorded. 
The error of the measurement is expressed as twice the standard deviation of 
the 10 independent runs. 

Finally, the performance of the four network inference algorithms 
is tested on 7 select e d gen e network modules of Escherichia coli 
jPereerin-Alvarez et al\ <2009l) V While ARACNE, CLR and KELLER 
are deteiministic algorithm^ RegnANN may produce different results 
depending on the random initialization of the weights in the ensemble 
of multi-layer perceptrons. Thus, in order to smooth out possible local 
minima, we adopted a majority voting schema: for each network module, 
the RegnANN algorithm is applied 10 times and the inferred adjacency 
matrices accumulated. The final topology is obtained selecting those links 
that appeared with a frequency higher than 7 (out of 10). The entire 
procedure is repeated 10 times and the final prediction is estimated as the 
mean and the associated en'or as twice the standard deviation of the 10 
independent runs. 



3 DATA 

Synthetic data: we benchmark the reverse engineering algorithms 
here considered using both synthetic and biological data. Synthetic 
data are obtaine d considering two di f ferent network topologies: 
Barabasi- Albert jSarabasi and Alber and Erdos-Renyi 

terdos and Renvil {195^^ Furthermore, we apply two different 
gene expression synthesis methods: the first one considers only 
linear correlation among selected genes (SLC), the second one is 
based on a gene network/expression simulator r ecently proposed 
to ass ess reverse engineering algorithms (GES, lOi Camillo et al\ 
See Supplementary Material for full details. 

Escherichia coli data: the task for the biological experiments 
is the inference of a few transcriptional subnetworks of the 
model organism Escherichia coli starting from a set of steady 
state gene expression data. The data are obtained from different 
sources and they consist of three different elements, namely the 
whole Escherichia coli transcriptional network, the set of the 

^ Given a particular input, the algorithm will always produce the same 
output, always passing through the same sequence of states. 



transcriptional subnetworks and the gene expression profiles to infer 
the subnetworks from. The Escherichia coli transcriptional network 
is extracted from the RegulonDB0 database, version 6.4 (2010) and 
it consists of 3557 experimentally confirmed regulations between 
1442 genes, amongst whi ch 172 transcripti on factors. The 117 
subnetworks are defined in iMarref al\ b.Ql(h : in our experiments 
we use 7 of these subnetworks, including a number of genes ranging 
from 7 to 1 04. T he expression data have been originally used in 
iFaith et al. 1 l l2007h and consist of 445 Escherichia coli Affymetrix 
Antisense2 microarray expression profiles for 4345 genes, collected 
under different experimental conditions such as PH changes, growth 
phases, antibiotics, heat shock, varying oxygen concentrations and 
numerous genetic perturbations. MAS5 preprocessing is chosen 
among the available options (MAS5, RMA, gcRMA, DChip). 



4 RESULTS 

Due to space constraints, hereafter we present a selection of the 
outcomes of the experimental evaluation with emphasis on the 
reconstruction variability; for previous usag e of M C C in network 
theory and applications see IStokic et al. ( l2009h : ISupper et al\ 
l l2007l) . 



Synthetic data: Figure[2]illustrates the MCC scores obtained with 
ARACNE, CLR, KELLER and RegnANN for synthetic Barabasi 
networks (scale free, exponent P — 1), varying the number of 
nodes. In order to provide similar amount of information to the 
inference algorithms while varying the size of the network, we 
kept constant the data ratio: the number of expression profiles to 
number of nodes (80%) - e.g.: 50 nodes, 40 different expression 
profiles; 200 nodes 160 different expression profiles. Expression 
values are linearly rescaled in [—1, 1]. Figure |2] indicates that the 
MCC scores on Barabasi networks depend on both the inference 
algorithm and the data synthesis methods, while the size of the 
network (number of nodes considered) has a somewhat smaller 
impact on the performance. RegnANN-GES scores 0.5 ±0.1 on a 
network of 200 nodes, while RegnANN-SLC scores 0.34 ± 0.08 on 
a similarly sized network. KELLER scores 0.4 ±0.1 irrespective 
of the data synthesis method applied on the 200 nodes network. 
On the same sized network, ARACNE-GES scores 0.42 ± 0.04 
while ARACNE-SLC scores 0.28 ± 0.06. Finally, CLR shows the 
worst performance of the four algorithms tested, in'espective of the 
network size and the data synthesis adopted, e.g.: 0.17±0.02 (GES) 
for a network of 200 nodes - 0.18 ± 0.01, in the case of SLC. 

Figure [3] shows the MCC scores for the same network inference 
methods as above, varying the number of expression profiles 
considered while keeping constant the size of the Barabasi network 
(100 nodes). Expression values are statistically normalized. Figure 
[3] indicates that the MCC scores greatly vary when considering 
statistically normalized values while varying the amount of data 
generated (the number of expression profiles). The data synthesis 
method adopted can also greatly affect the performance score. MCC 
scores for RegnANN, ARACNE and CLR show to be positively 
affected when the number of generated expression profiles is 
increased from 10 to 40: RegnANN-GES scores 0.20 ± 0.02 
considering only 10 profiles, while scoring 0.50 ± 0.08 with 40 
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Fig. 2. MCC scores of the different network inference algorithms for 
synthetic Barabasi networks (scale free, exponent P = 1), varying the 
number of nodes and keeping constant the data ratio: the number of 
expression profiles to the number of nodes (80%). Both methods (GES 
and SLC) for data synthesis are considered. Expression values are linearly 
rescaled in [—1,1]. 



different. Adopting SLC data synthesis, RegnANN scores 0.07 ± 
0.04 and 0.24±0.06 with 10 and 40 expression profiles respectively. 
Similarly, ARACNE-GES scores 0.28 ± 0.06 and 0.35 ± 0.04 
with 10 and 40 expression profiles respectively. ARACNE-SLC 
scores 0.15 ± 0.08 and 0.31 ± 0.06 with 10 and 40 expression 
profiles respectively. On the other hand, as also shown in Figure 
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Fig. 3. MCC scores of the different network inference algorithms for 
synthetic Barabasi networks (scale free, exponent P = 1), varying number 
of expression profiles and constant number of nodes (100). Both methods 
(GES and SLC) for data synthesis are considered. Expression values are 
statistically normalized (zero mean and unit standard deviation). 
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Fig. 4. MCC scores of the different network inference algorithms for 
synthetic Barabasi networks (scale free, exponent P = 1), varying 
data normalization method (Discretization, Linear Rescahng and Statistical 
Normalization) and constant network size (200 nodes) and number of 
expression profiles generated (160). Only SLC data synthesis is considered. 



(2] CLR shows performance curves that are not influenced by the 
data synthesis method adopted: it scores 0.13 ± 0.01 (GES) with 
10 expression profiles; 0.10 ± 0.04 synthesizing data with SLC. 
With 40 expression profiles CLR-GES scores 0.22 ± 0.04;CLR- 
SLC scores 0.21 ± 0.04. On the contrary, Figure [3] shows that 
the performance of KELLER is greatly influenced by the data 
synthesis method, while the number of expression profiles has a 
somewhat limited impact: KELLER scores 0.44±0.06 synthesizing 
expression profiles with GES (40 in total), it scores 0.18 ± 0.02 
using SLC to generate 40 profiles. 

Figure |4] shows the MCC scores obtained with ARACNE, CLR, 
KELLER and RegnANN by varying data normalization methods 
while keeping constant the network size (200 nodes) and the 
number of expression profiles generated (160). Only the SLC 
data synthesis is considered. Figure |4] indicates that ARACNE, 
CLR and RegnANN MCC scores are not significantly affected 
- considering the error of the measure - by the normalization 
method: RegnANN scores 0.42 ± 0.06, 0.4 ± 0.1 and 0.4 ±0.1 
applying respectively discretization, linear rescaling and statistical 
normalization to the data. Similarly, ARACNE scores 0.24 ± 0.04, 
0.28 ± 0.03 and 0.28 ± 0.03 when the expression values are 
discretized, linearly rescaled and statistically normalized. Finally, 
CLR scores: 0.14 ± 0.04, 0.17 ± 0.01 and 0.17 ± 0.01 for the very 
same normalization methods above (discretization, linear rescaling, 
statistical normalization). On the other hand, KELLER MCC scores 
show to be highly influenced by the normalization method applied 
to the synthetic data. In the case of discretization and in the case of 
statistical normalization KELLER scores 0.10 ± 0.01 and 0.19 ± 
0.01 respectively. In the case of linear rescaling it scores a higher 
value: 0.40 ± 0.07. 

Figure [5] shows the MCC scores obtained with ARACNE, 
CLR, KELLER and RegnANN for synthetic Erdos-Renyi networks 
(random graph, mean degree D — 1), varying the number of nodes. 
In order to provide similar amount of information to the inference 
algorithms while varying the size of the network, we kept constant 
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Fig. 5. MCC scores of the different network inference algorithms for 
synthetic Erdos-Renyi networks (random graph, mean degree D = 1), 
varying the number of nodes and keeping constant the data ratio: the number 
of expression profiles to the number of nodes (80%). Both methods (GES 
and SLC) for data synthesis are considered. Expression values are linearly 
rescaled in [—1,1]. 



Fig. 6. MCC scores of the different network inference algorithms for 
synthetic Erdos-Renyi networks (random graph, mean degree D = 1), 
varying the number of expression profiles and keeping constant the number 
of nodes (100). Both methods (GES and SLC) for data synthesis are 
considered. Expression values are statistically normalized (zero mean and 
unit standard deviation). 



the data ratio (80%). Expression values are linearly rescaled in 
[—1,1]. In the case of Erdos-Renyi networks the MCC curves 
are greatly and unevenly affected by all the parameters explored: 
inference method, size of the network and data synthesis method. 
ARACNE and CLR show a decreasing MCC score - although 
not strictly statistically significant - when the number of nodes in 
the network is increased from 50 to 200: ARACNE-GES scores 
0.29 ± 0.08 with network size 50, 0.25 ± 0.04 with network size 
200. Similarly, CLR-GES scores 0.19 ± 0.06 with network size 
50, 0.11 ± 0.02 with network size 200 - a similar negative trend is 
recorded in case of SLC data synthesis. On the other hand, KELLER 
and RegnANN have higher MCC when the number of nodes in 
the network is increased from 50 to 200: KELLER-SLC scores 
0.39 ± 0.08 network size 50, 0.65 ± 0.08 when the network size 
is 200. Similarly, RegnANN-SLC scores 0.4 ± 0.1 and 0.64 ± 0.04 
for network size 50 and 200 respectively. Considering GES for 
synthetic data generation, the MCC curves are significantly different 
for both KELLER and RegnANN: KELLER scores 0.37 ± 0.04 for 
network size 200 while RegnANN scores 0.20 ± 0.04 for similarly 
sized networks (200 nodes). 

Figure |6] shows the MCC scores for the same network inference 
methods as above, varying the number of expression profiles 
considered while keeping constant the size of the Erdos-Renyi 
network (100 nodes). Expression values are statistically normalized. 
As indicated in Figure (6) KELLER and RegnANN show opposite 
MCC curves by increasing the amount of expression profiles 
generated. RegnANN-GES shows rapidly increasing scores varying 
the number of expression profiles from 10 to 80: 0.12 ± 0.02 
and 0.6 ± 0.1 respectively. KELLER-GES scores 0.28 ± 0.06 
and KELLER-SLC scores 0.13 ± 0.04 with 10 expression profiles. 
KELLER-GES scores 0.16±0.01 and KELLER-SLC scores 0.17± 
0.04 with 80 expression profiles. On the other hand, MCC curves for 



CLR are limitedly affected by the number of expression profiles or 
by the data generation methodology: with 80 expression profiles it 
scores 0.15 ± 0.02 using GES and 0.16 ± 0.02 using SLC for data 
synthesis. 

Figure |7] shows the MCC scores obtained with ARACNE, CLR, 
KELLER and RegnANN varying data normalization method while 
keeping constant the network size (200 nodes) and the number of 
expression profiles generated (160). Only the SLC data synthesis 
is considered. As in the case of Barabasi networks (Figure |4ll. 
Figure |7] shows that ARACNE, CLR and RegnANN MCC scores 
are not significantly affected by the normalization method. On the 
contrary, KELLER is significantly affected: it scores 0.11 ± 0.01 
and 0.15 ± 0.02 when the expression values are discretized and 
statistically normalized respectively. A higher value of 0.65 ± 0.08 
is recorded in the linearly rescaled case. 

Selected Escherichia coli subnetworks: Table [T] summarizes the 
results obtained on a sel e ction o f Escherichia coli gene subnetworks 
jPeregrin- Alvarez et al. 1 j2009l) ) for the four inference algorithms. 
Gene expression values are linearly rescaled in [—1, 1]. 

As for the case of synthetic data, Table[T]indicates great variability 
of the MCC scores across the different network modules for all 
the inference methods tested. ARACNE scores range from 0.78 
(module 81) to 0.00 (module 88). CLR values range between 0.45 
and 0.02 for module 81 and 96 respectively. KELLER scores range 
between 0.63 and —0.12 (module 12 and module 81 respectively). 
Finally RegnANN scores range between 0.32 ± 0.000 (module 12) 
and -0.05 ± 0.02 (module 88). It is interesting to note that the 
MCC score varies unevenly for the different inference algorithms 



' In this case the en'or associated to the measure is 0: the very same result 
is obtained for all repetitions. 
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Fig. 7. MCC scores of the different network inference algorithms for 
synthetic Erdos-Renyi networks (random graph, mean degree D = 1), 
varying data normalization method (Discretization, Linear Rescahng and 
Statistical Normalization) and constant network size (200 nodes) and 
number of expression profiles generated (160). Only SLC data synthesis is 
considered. 

Table 1. MCC scores of the different network inference algorithms 
on selected Escherichia coli network modules 



ID 


D 


NN 


NL 


A. 


CLR 


K. 


R.ANN 


Err 


81 


0.245 


7 


12 


0.78 


0.45 


-0.12 


0.4 


0.1 


6 


0.189 


13 


32 


0.13 


0.29 


0.02 


0.3 


0.1 


12 


0.180 


10 


18 


0.43 


0.42 


0.63 


0.32 


0.00 


75 


0.133 


16 


34 


0.10 


0.24 


0.10 


0.23 


0.08 


88 


0.100 


19 


36 


0.00 


0.17 


-0.07 


-0.05 


0.02 


96 


0.001 


104 


18 


0.08 


0.02 


0.00 


0.00 


0.01 


94 


0.000 


81 


2 


0.09 


0.02 


0.15 


0.026 


0.001 



Colum n ID indicates the id of the network module as in lPeregrin-Alvarez et all 
|20Q9|) . D the density of the module (the ratio of the number of links to the 
square of the number of nodes). NN the number of nodes in the module. LN 
the number of links. Column A. shows results for ARACNE, Column K. results 
for KELLER and R.ANN the results for RegnANN. Column Err refers to the 
error associated to the MCC score of RegnANN and it is calculated as twice the 
standard deviation of 10 independent runs. 



with respect to the module network density (the ratio of the number 
of links to the square of the number of nodes), e.g.: ARACNE scores 
0.13 on module 6 (density D = 0.189) and scores 0.43 on module 
12 (density D = 0.189). On the same two modules, CLR scores 
0.29 and 0.39 respectively while KELLER scores 0.02 and 0.63. 
On the other hand, RegnANN are more homogeneous: it scores 
0.3 ±0.1 and 0.32 ± 0.00 on module 6 and module 12 respectively. 



5 DISCUSSION 

The analysis of the results obtained on synthetic data shows 
that the performance of the inference methods are highly and 
unevenly influenced by the simulation parameters: the topology 
of the network and its size, the method used to synthesize the 



expression values and the raw data normalization step adopted. The 
dependency of the results on all the parameters of the simulations 
makes objectively very hard the task of establishing which method 
performs best. Generally, RegnANN shows performance scores 
that compare very favorably with all the other inference method 
tested. The solution based on ANN provides good results on 
both Barabasi and Erdos-Renyi networks varying the number of 
expression profiles synthesized (Figure[3]and Figure[6l(, and it shows 
stable MCC scores with regards to the different data normalization 
adopted (Figure |4] and Figure [7]l. The evaluation on synthetic data 
indicates that CLR is the most stable inference method with regards 
to variations in the network topology and in the data synthesis, 
although it shows MCC scores that compare unfavorably with the 
other methods. On the other hand, ARACNE compares favorably 
with KELLER and RegnANN in terms of MCC, showing also 
stability with regards to the different data normalization adopted 
(Figure |4] and Figure [7]l. On the contrary, KELLER shows a great 
deal of variability in the MCC scores with regards to the different 
data normalization methods: Figure|4]and Figure|7]indicate that this 
algorithm performs best when the expression pro files are l inearl y 
rescaled in [—1, 1]. These results suggest that in lSong et al\ ^20091) , 
the algorithm may be not be performing at its best since the author 
discretized the expression values in { — 1, 1}. 

The results on the Escherichia coli gene network modules confirm 
that the inference algorithms tested show great variability in the 
MCC scores, suggesting that the correctness of the inferred network 
depends on the topological properties of the modules (the very same 
expression values are used t o infer the different gene sub-netw orks), 
in accordance to findings in lAltav and Emmert-StreibI ( |2010|) . 

The great deal of variability in the results for both synthetic and 
real-world data indicates that each inference method can potentially 
select the correct network topology - or the inconect one - 
depending on a number of factors which may not be limited to the 
relative small set of parameters explored here. With regard to this, 
we lastly verify possible stability issues of the network inference 
algorithms related to the re-generation of synthetic data for a given 
sample network topology. Table |2] shows the results obtained on 
a sample Barabasi network and a sample Erdos-Renyi network 
(fixed topology, 100 nodes data ratio 80%) by applying each 
inference algorithm 10 times on 10 different simulated expression 
values (SLC method). Column Accuracy indicates the mean MCC 
score for the 10 inferred adjacency matrices with respect to the 
ground-truth (A.Err is the associated error calculated as twice the 
standard deviation of the mean). Column Stability indicates the 
mean distance among all the inferred topologies: a value equal to 
1 indicates perfect stability, e.g.: the same topology is reconstructed 
all the times (similarly, indicates random results) - S.Err is 
the associated error calculated as twice the standard deviation of 
the mean. Table [2] suggests again that all the methods suffer of 
problems related to the variability of the inferred network topology: 
no method shows a stability score close to 1. KELLER scores 
best on the Barabasi network (stability of 0.58 ± 0.06, accuracy 
0.40 ± 0.07). Both KELLER and RegnANN score best on the 
Erdos-Renyi network (a stability of about 0.27, an accuracy of about 
0.47). 
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Table 2. Accuracy [MCC] scores in network topology inference for 
tlie different reverse engineering algoritlims and their stability [MCC] . 



Synthetic Barabasi Network 










Accuracy [MCC] 


A.Err 


Stability [MCC] 


S.Err 


ARACNE 


0.31 


0.04 


0.21 


0.04 


CLR 


0.23 


0.02 


0.24 


0.03 


KELLER 


0.40 


0.07 


0.58 


0.06 


RegnANN 


0.37 


0.06 


0.42 


0.07 


Synthetic Erdos-Renyi Network 










Accuracy [MCC] 


A.Err 


Stability [MCC] 


S.Err 


ARACNE 


0.39 


0.02 


0.17 


0.03 


CLR 


0.16 


0.01 


0.07 


0.03 


KELLER 


0.47 


0.07 


0.27 


0.05 


RegnANN 


0.49 


0.06 


0.28 


0.05 



Column Accuracy indicates the mean MCC score in reconstructing the target 
network topology, column A.Eit the associated eiTor. Column Stability indicates 
the mean distance [MCC] among all the infeiTed topologies, column S.Eit the 
associated enor. 



6 CONCLUSION 

In this work we presented a novel method for network inference 
based on an ensemble of multi-layer perceptrons configured as 
multi- variable regressor (RegnANN). We compared its performance 
to the performance of three different network inference algorithms 
(ARACNE, CLR and KELLER) on the task of reverse engineering 
the gene network topology, in terms of the associated IVICC score. 

Our extensive evaluation indicates that all the algorithms suffer 
of instability in the reconstruction of the network topology due 
to the various sources of variability, possibliy not limited to the 
relative small set of parameters explored here. Because of such 
instability, it is objectively very difficult to establish which method 
performs best. Generally, the newly introduced RegnANN shows 
performance scores that compare very favorably with all the other 
inference methods tested. Nonetheless further efforts are required in 
order to effectively cope with the difficulty of the task and minimize 
the variability of the inference process. 
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APPENDIX 

1 GENE EXPRESSION NORMALIZATION 

Generally, in microarray experiments, the analysis of the raw data is 
often hampered by a number of technical and statistical problems. 
The possible remedies usually lie in appropriate preprocessing 
steps, proper normalization of the data and application of statistical 
testing procedures in the deriva tion of differentially expressed genes 
dSteinhoffand VingrorJ | |2006|) ). Although many of the real-world 
issues in data preprocessing and normalization do not apply here, we 
are interested in verifying how discretization and rescaling - some 
of the most common (and possibly simple) steps taken to normalize 
the raw data - can impact the accuracy of the network inference 
algorithms here considered. 



1.1 Discretization 

It is often the case that a number of sources of noise can be 
introduced into the microarray measurements, e.g. during the 
stage of hybridization, digitization and normalization. Therefore, 
it is often preferred to consider only th e qualitative level of gene 
expression rather than its actual value jSong etaL\ ( l2009h ): gene 
expression is modeled as either being up-regulated (-1-1) or down- 
regulated ( — 1) by comparing the give n value to a threshold. For 
example, in iTuna and NiranianI ( |2009|) it is shown that binarizing 
gene expression data leads to classification outcomes very similar 
to the results obtained on real-valued data. 

In this work we compute the discrete value of the expression 
for each of the genes at each of the M steps as the sign of the 
difference of the expression values of the given gene at step m and 
step m — 1. 



1.2 Rescaling 

Generally, when a scaling method is applied to the data, it is 
assum ed that different sets of i ntensi ties differ by a constant global 
factor iSteinhoff and VingronI ( l2006h ). It may also happen that the 
rescaling is a necessary step due to the inference method adopted, 
as in the case of SVM (Support Vector Machine) or ANN (Artificial 
Neural Network) classification/regression. 

In this work we test two different data rescaling methods: 

• linear rescaling: each gene expression column-vector is 
linearly rescaled between [—1, 1]; 

• statistical normalization: each gene expression column- vectoj^ 
is rescaled such that its mean value is equal to and the 
standard deviation equal to 1. 



2 PERFORMANCE METRIC 

When the performance of a network inference method is evaluated, 
it is common practice to adopt two metrics: precision and recall. 
Recall indicates the fraction of true interactions correctly inferred by 
the algorithm, and is estimated according to the following equation: 



Recall — 



TP 



(1) 



TP + FN 

where TP indicates the fraction of true positives, while FN 
indicates the fraction of false negatives. 

On the other hand, precision measures the fraction of true 
interactions among all inferred ones, and it is computed as: 



Precision = 



TP 



(2) 



TP + FP 

where FP indicates the ratio of false positives. 

In this work we adopt ins t ead t he Matthew s corr elation 
coefficients - MCC l lBaldi et al\ ( |2000|) : iMatthewsl ( 1 19751) ): this 
is a measure that takes into account both true/false positives and 
true/false negatives and it is generally regarded to as a balanced 
measure, useful specially in the case of unbalanced classes (i.e.: not 
equal number of positive and negative examples). 

The MCC is in essence a correlation coefficient between the 
observed and predicted binary classifications: it returns a value 
between —1 and -1-1. A coefficient value equal to -f 1 represents 
a perfect prediction, indic ates an average ran d om prediction whi le 
-1 an inverse prediction jSaldi et al.\ i l2000l) : iMatthewj l ll975l) ). 
In the context of network topology inference the observed class is 
the true network adjacency matrix, while the predicted class is the 
inferred one. 

The Matthews correlation coefficient has the following is 
obtained according to the following equation: 



MCC = 



TP ■ TN - FP ■ FN 



^(TP + FP) (TP + FN) (TN + FP) (TN + FN) 



(3) 



Recently MCC has also be e n used for com paring network 
topologies jSupper et all ( l2007l) : lstokic el al\ JioOft) ). 



3 SYNTHETIC DATA GENERATION 

The synthetic data sets used in the main paper are obtained starting 
from an adjacency matrix describing the selected topologjQ In 
this work we consider undirected graphs: we are interested in 
estimating the structures of interaction between nodes/genes, rather 
than the detailed strength or the direction of these interactions. 
Thus, we consider only symmetric and discrete adjacency matrices, 
representing with a value of 1 the presence of a link between two 
nodes. A value equal to in the adjacency matrix indicates no 
interaction. 

3.1 Network Topology 

Here we consider two differe nt network topologies : Barabasi-Albert 
Barabasi and Alber and Erdos-Renyi jErdos and Renvil 

19591)). Figure [8] shows two sample network topologies: left, 

Barabasi Network with 100 nodes (power-law exponent P equal to 

1); right, Erdos-Renyi network, 100 nodes and average degree (D) 

equal to 0.92. 

Once the topology of the network is (randomly) generated, 
the output profiles of each node are generated according to the 
approaches in the following section. 



* In this work we consider gene expression matrices of dimension M X N: 
N genes whose expression levels are recorded M times. 



^ The network graph is generated using the igraph extension package to the 
GNU R project for Statistical Computing. 
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3.2 Gene Expression Synthesis 

3.2.1 Simple Linear Correlation (SLC): similarly to the simulation 
of gene expression data pr e sented in the supplementary material 
of iLangfelder and HorvathI ilOOj) . we consider a set of seed 
expressions (a matrix M x N - N genes which expression profiles 
are recoded A/ times - with values uniformly distributed in [-1, 1]) 
and the desired topology expressed by the adjacency matrix adjM 
(N X N). The gene expression profiles (gep, a matrix M x A'^) are 
calculated as: 



gep — seed + seed ★ adjM (4) 

where the symbol '+' indicates element-element summation and 
the symbol indicates row-column matrix multiplication. With 
this method, the seed expression columns are linearly correlated 
(correlation equal to 1) with the columns of the same matrix as 
described by the discrete input adjacency matrix adjM. 



3.2.2 Gene Expression Simulator ( GES): this second methodology 
is based on a gene network simulator recently proposed to assess 
reverse engineering algorithms |Pi Camillo ef <;/. (200^). Given 
an input adjacency matrix, the network simulator uses fuzzy logic 
to represent interactions among the regulators of each gene and 
adopts differen t ial eq uations to generate continuous data. As in 
iMargolin et al\ ( |2006|) . we obtain synthetic expression values of 
each gene n (n = 1, . . . , A'^) by simulating its dynamics until the 
expression value reaches its steady state. We obtain M different 
values for each gene by repeating the process M times and recording 
the expression value at steady state. The synthesis of each gene 
profile is randomly initialized by the simulator. 




Fig. 8. Sample network topologies: left, Barabasi Network with 100 nodes (power-law exponent P equal to 1); right Erdos-Renyi network, 100 nodes and 
average degree (D) egual to 0.92. 
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